Optimization#111
Merged
Merged
Conversation
…any) cc3d statistics also computes centroids+bboxes that np_volume discards. For many-label arrays (e.g. connected-component maps) np.bincount is far cheaper: ~37x faster on a 64k-component map and ~3x on ~400 labels, while cc3d stays faster for the few-label anatomical-segmentation case. Switch on a cheap arr.max()>256 check to get best-of-both. Verified equal across dtypes / label counts / include_zero; speedtest_volume.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
For 3D arrays, two of the three axis extents are derived from a single shared 2D projection (np.any over the contiguous last axis), so only one extra full reduction is needed. ~13-18% faster across 256^3/512^3 and px_dist values; identical slices (verified vs old impl on 2D+3D, incl. empty handling). Generic n-D path unchanged. speedtest_bbox_binary.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ding_boxes np_center_of_mass and np_bounding_boxes built a `unique` list then filtered with `idx in unique` (O(max_label x n_unique)). Check voxel_counts[idx] directly instead. ~2.2x faster at ~4k labels, ~1.3x at ~2k, unchanged for few labels; identical output (use_crop preserved). speedtest_center_of_mass.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…+mask The per-label per-iteration 'data = out.copy(); data[i != data] = 0' was a full array copy plus a masked write. _binary_dilation casts its input to bool anyway, so 'data = out == i' is bit-exact (verified across 6480 configs) and skips the copy. ~11-18% faster across n_pixel/connectivity on few-label 150^3 segs. The n_pixel loop is kept (it encodes iterative inter-label competition that a single larger-kernel pass would not reproduce). speedtest_dilate_vectorized.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
np.isin is 3-6x slower than a boolean lookup table for multi-label membership on uint segmentation masks. New np_isin() builds lut[labels]=True and gathers lut[arr] for unsigned arrays with a small label range; it special-cases the single-label (arr==label) case and falls back to np.isin for signed/negative/ huge-range inputs. Verified equal to np.isin across 132 dtype/label/invert cases. Applied at the 9 multi-label np.isin sites (extract_label, erode/dilate (+euclid), connected_components, filter_connected_components). numpy's own kind='table' did not help. speedtest_isin_lut.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The keep_label path called get_seg_array() twice (two full array copies) plus np_extract_label and a multiply. Now it takes a single copy and zeros voxels not in the label set via np_isin, keeping original label values. ~1.74x faster on a 300^3 mask; output identical (verified scalar+list, keep+binary paths). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-label 'seg_arr[seg_arr == l] = 0' loop costs one full pass per removed
label (linear in label count). A single np_map_labels gather ({label: fill})
is constant-time: tied with the loop for a few labels, ~2.2x faster at 20
labels (sparse) and ~6x on dense masks. Enums are now resolved to .value like
extract_label does (the int path is unchanged). Verified equal across
scalar/list/nested labels and removed_to_label values.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… verbose The two np_unique full-array scans only feed the verbose log line. Guarding them behind 'verbose' makes the common in-loop verbose=False path ~5x faster on a 300^3 mask. The verbose=True output is unchanged; the returned data is identical either way. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
It called extract_label(...).get_seg_array() twice (each a full round-trip: copy + np_extract_label + NII construction + copy) just to get two binary masks. Both masks now come directly from the single get_array() via np_isin. ~2.15x faster on a 300^3 mask; output identical (verified across idx/not_beyond/axis/inclusion). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mass pass The default _crop path looped over every label doing extract_label(i) + compute_crop + scipy center_of_mass. np_center_of_mass (cc3d) returns every label's centroid in a single pass. ~5x faster at 8-16 labels and ~9x at 20-40 labels; output bit-identical (verified 379/379 points exact to the rounded decimal). The non-_crop fallback is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
POI_Global construction (to_global) and to_other applied the affine transform one point at a time in a Python loop. Added vectorized local_to_global_arr (POI) and global_to_local_arr (Has_Grid) that transform an (N,3) array in a single matmul, and use them in those loops. ~7-8x faster (100-400 points); output bit-identical (verified vs per-point, with/without itk_coords). to_other keeps the per-point path when verbose=True to preserve its logging. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flatten-mode filtering did candidates.copy() then list.remove() per dropped file (each remove is O(n) -> O(n^2) overall). Replaced with a single list comprehension; ~48x faster filtering 2000 candidates. The dict-mode branches likewise drop the throwaway dict copy()+pop() for a dict comprehension. Output identical (verified across flatten/dict x keys for both filter methods). The comprehension also removes by identity, avoiding list.remove's first-equal removal quirk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_get_mesh called from_segmentation_nii(extract_label(u)) for every label, and that reorients + rescales (resamples) the image each time. Reorient/rescale commute with extract_label for nearest-neighbour segmentation resampling, so the image is now transformed once before the loop. ~5x (12 labels) to ~7x (25 labels) faster on the transform; the per-label marching-cubes meshes are bit-identical (verified arrays and mesh vertices for rescale_to_iso True/False). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR focuses on performance optimizations across segmentation/label-array utilities, POI coordinate transforms, mesh preview generation, and BIDS candidate filtering, and adds a set of speedtest scripts to benchmark the proposed improvements.
Changes:
- Introduces faster label/segmentation primitives (
np_isinLUT path,np_volumeheuristic, fasternp_center_of_mass/np_bounding_boxes, 3D-specializednp_bbox_binary, and reduced-copynp_dilate_mskinner loop). - Vectorizes POI coordinate conversions and accelerates centroid computation by using a single cc3d statistics pass.
- Optimizes higher-level workflows (mesh preview label loop, NII label operations, BIDS filter loops) and adds multiple benchmarking scripts under
tests/speedtests/.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| TPTBox/tests/speedtests/speedtest_volume.py | New benchmark for np_volume implementations across label-count regimes. |
| TPTBox/tests/speedtests/speedtest_poi_to_global.py | New benchmark for batched POI local→global conversion. |
| TPTBox/tests/speedtests/speedtest_poi_calc_centroids.py | New benchmark for centroid computation approaches. |
| TPTBox/tests/speedtests/speedtest_nii_truncate_masks.py | New benchmark for mask extraction optimization in truncation logic. |
| TPTBox/tests/speedtests/speedtest_nii_remove_labels.py | New benchmark for remove_labels implementations (loop vs map/isin). |
| TPTBox/tests/speedtests/speedtest_nii_map_labels.py | New benchmark for avoiding np_unique scans when verbose=False. |
| TPTBox/tests/speedtests/speedtest_nii_extract_label_keep.py | New benchmark for extract_label(..., keep_label=True) optimization. |
| TPTBox/tests/speedtests/speedtest_mesh_preview_hoist.py | New benchmark for hoisting reorient/rescale outside per-label mesh loop. |
| TPTBox/tests/speedtests/speedtest_isin_lut.py | New benchmark comparing np.isin modes vs explicit LUT. |
| TPTBox/tests/speedtests/speedtest_dilate_vectorized.py | New benchmark for reduced-copy dilation inner loop (out == i). |
| TPTBox/tests/speedtests/speedtest_center_of_mass.py | New benchmark for direct voxel-count filtering in cc3d stats postprocessing. |
| TPTBox/tests/speedtests/speedtest_bids_filter.py | New benchmark for O(n) list comprehension filter vs O(n²) remove loop. |
| TPTBox/tests/speedtests/speedtest_bbox_binary.py | New benchmark for 3D np_bbox_binary 2-pass specialization. |
| TPTBox/mesh3D/html_preview.py | Hoists reorient/rescale once for per-label mesh generation. |
| TPTBox/core/poi.py | Adds local_to_global_arr and speeds up calc_centroids (cc3d-based path). |
| TPTBox/core/poi_fun/poi_global.py | Uses batched affine/inverse-affine conversions when not verbose. |
| TPTBox/core/np_utils.py | Adds np_isin; updates multiple utilities to use it; optimizes volume/COM/bbox/dilate/bbox_binary. |
| TPTBox/core/nii_wrapper.py | Uses np_isin in truncation/extract-label; avoids verbose-only scans; speeds remove_labels via np_map_labels. |
| TPTBox/core/nii_poi_abstract.py | Adds global_to_local_arr vectorized conversion. |
| TPTBox/core/bids_files.py | Replaces copy+remove loops with comprehensions (flatten and dict modes). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
471
to
+475
| else: | ||
| arrc = arr | ||
| if labels is not None: | ||
| arrc = arrc.copy() | ||
| arrc[np.isin(arr_bin, labels, invert=True)] = 0 | ||
| arrc[np_isin(arr_bin, labels, invert=True)] = 0 |
robert-graf
reviewed
Jun 12, 2026
robert-graf
reviewed
Jun 12, 2026
|
|
||
| eq = lambda x, y: x == y # noqa: E731 | ||
|
|
||
| for n_labels in (100, 400): |
Collaborator
There was a problem hiding this comment.
How much time does this save? Usually, poi resampling is negligible fast.
Collaborator
|
Fix the Copilete and my int comment. Rest LGTM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.